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Abstract — A simple multivariate version of Costa's entropy 
power inequality is proved. In particular, it is sliown that if 
independent white Gaussian noise is added to an arbitrary 
multivariate signal, the entropy power of the resulting random 
variable is a multidimensional concave function of the individual 
variances of the components of the signal. As a side result, we 
also give an expression for the Hessian matrix of the entropy 
and entropy power functions with respect to the variances of the 
signal components, which is an interesting result in its own right. 

I. Introduction 

The entropy power of the random vector Y e R" was first 
introduced by Shannon in his seminal work [1] and is, since 
then, defined as 

N{Y) = ^exp(^h{Y)y (1) 

where h{Y) represents the differential entropy, which, for 
continuous random vectors reads a^ 

For the case where the distribution of Y assigns positive 
mass to one or more singletons in R", the above definition 
is extended with h{Y) — — oo. 

The entropy power of a random vector Y represents the 
variance (or power) of a standard Gaussian random vector 
Yg ^ A/'(0,(T^I„) such that both Y and Yq have identical 
differential entropy, /i(1^g) = f^O^)- 
A. Shannon's entropy power inequality (EPI) 

For any two independent arbitrary random vectors X G R" 
and W e R", Shannon gave in [1] the following inequality: 

N{X + W)> N{X) + N{W). 

The first rigorous proof of Shannon's EPI was given in [2] 
by Stam, and was simplified by Blachman in [3]. A simple 
and very elegant proof by Verdu and Guo based on estimation 
theoretic considerations has recently appeared in [4]. 

Among many other important results, Bergmans' proof of 
the converse for the degraded Gaussian broadcast channel [5] 
and Oohama's partial solution to the rate distortion region 
problem for Gaussian multiterminal source coding systems [6] 
follow from Shannon's EPI. 

'Throughout this paper we work with natural logarithms. 



B. Costa's EPI 

Under the setting of Shannon's EPI, Costa proved in [7] 
that, provided that the random vector W is white Gaussian 
distributed, then Shannon's EPI can be strengthened to 

N{X + VtW)>il-t)N{X)+tN{X + W), (2) 

where t E [0, 1]. As Costa noted, the above EPI is equivalent 
to the concavity of the entropy power function N {X + ^/tW) 
with respect to the parameter t, or, formalljH 

—N{X + VtW)<0. (3) 

Due to its inherent interest and to the fact that the proof by 
Costa was rather involved, simplified proofs of his result have 
been subsequently given in [8]-[ll]. 

Additionally, in his paper Costa presented two extensions of 
his main result in ([3]). Precisely, he showed that the EPI is also 
valid when the Gaussian vector W is not white, and also for 
the case where the t parameter is multiplying the arbitrarily 
distributed random vector X, 

d^ 

—N{ViX + W)<0. (4) 

Similarly to Shannon's EPI, Costa's EPI has been used 
successfully to derive important information-theoretic results 
concerning, e.g., Gaussian interference channels in [12] or 
multi-antenna flat fading channels with memory in [13]. 

C. Aim of the paper 

Our objective is to extend the particular case in (|4]i of 
Costa's EPI to the multivariate case, allowing the real param- 
eter t G R to become a matrix T G R"^", which, to the best 
of the authors' knowledge, has not been considered before. 

Beyond its theoretical interest, the motivation behind this 
study is due to the fact that the concavity of the entropy 
power with respect to T implies the concavity of the entropy 
and mutual information quantities, which would be a very 
desirable property in optimization procedures in order to be 
able to, e.g., design the linear precoder that maximizes the 

^The equivalence between equations and ^3) is due to the fact that the 
function N(^X + \/iW) is twice differentiable almost everywhere thanks to 
the smoothing properties of the added Gaussian noise. 



mutual information in the linear vector Gaussian channel with 
arbitrary input distributions. 

Consequently, we investigate the concavity of the function 



N(T^/^X + W), 



(5) 



with respect to the symmetric matrix T = x^/^T-^/^. Un- 
fortunately, the concavity in T of the entropy power can be 
easily disproved by finding simple counterexamples as in [14] 
or even through numerical computations of the entropy power 
Knowing this negative result, we thus focus our study on 
the next possible multivariate candidate: a diagonal matrix. 
Our objective now is to study the concavity of 

N{A^^^X + W), (6) 

w.r.t. the diagonal matrix A = diag(A), with — A.;. 
For the sake of notation, throughout this work we define 

Y = A^/^X + VF, 

where we recall that the random vector W is assumed 
to follow a white zero-mean Gaussian distribution and the 
distribution of the random vector X is arbitrary. In particular, 
the distribution of X is allowed to assign positive mass to one 
or more singletons in R". Consequently, the results presented 
in Theorems[T]and|2]in Section|lIl]also hold for the case where 
the random vector X is discrete. 

II. Mathematical preliminaries 

In this section we present a number of lemmas followed 
by a proposition that will prove useful in the proof of our 
multidimensional EPI. In our derivations, the identity matrix 
is denoted by I, the vector with all its entries equal to 1 is 
represented by 1, and AoB represents the Hadamard (or 
Schur) element-wise matrix product. 

Lemma 1 (Bhatia [15, p. 15]): Let A G S" be a positive 
semidefinite matrix, A > 0. Then it follows that 



A A 
A A 

Proof: Since A > 0, consider A — CC^ and write 

A A 1 _ r C 

A A C 



> 0. 



Lemma 2 (Bhatia [15, Exercise 1.3.10]): Let A G S"_|_ be 
a positive definite matrix, A > 0. Then it follows that 



A 
I 



I 

A-i 



> 0. 



(7) 



Proof: Consider again A = CC^, then we have A ^ = 



C -^C ^. Now, simply write O as 



A 


I 




C 




I 


I 




I 


A-i 









I 


I 











c-1 



which, from Sylvester's law of inertia for congruent matrices 
[15, p. 5] and Lemma [T] is positive semidefinite. ■ 
Lemma 3 (Schur Theorem): If the matrices A and B are 
positive semidefinite, then so is the product AoB. If, both A 



and B are positive definite, then so is A o B. In other words, 
the class of positive (semi)definite matrices is closed under the 
Hadamard product. 

Proof: See [16, Th. 7.5.3] or [17, Th. 5.2.1]. ■ 
Lemma 4 ( Schur complement): Let the matrices A G S" _|_ 
and B G S™+ be positive definite, A > and B > 0, and 
not necessarily of the same dimension. Then the following 
statements are equivalent 
A D 



B 

2) B > D^A 



>0, 



3) A > DB iD^, 
where D G R"^™ is any arbitrary matrix. 

Proof: See [16, Th. 7.7.6] and the second exercise 
following it or [18, Prop. 8.2.3]. ■ 

With the above lemmas at hand, we are now ready to prove 
the following proposition: 

Proposition 5: Consider two positive definite matrices A G 
S"_|_ and B G S!J.^ of the same dimension, and let Da be a 
diagonal matrix containing the diagonal elements of A, (i.e.. 
Da = a o I). Then it follows that 



A o B^ > Da (A o B)^ Da- 
Proof: From Lemmas [T] |2] and [3] it follows that 



(8) 



A 


A 




B I 




AoB Da 


A 


A 


o 


I B 1 




Da AoB 1 



> 0. 



Now, from Lemma H] the result follows directly. ■ 
Corollary 6: Let A G S!(:^ be a positive definite matrix. 
Then, 



di (AoA)-^dA < 



(9) 



where we have defined dA — DaI = (A o 1)1 as a column 
vector with the diagonal elements of matrix A. 

Proof: Particularizing the result in Proposition |5] with 
B = A and pre- and post-multiplying it by 1^ and 1 we 
obtain 

1^ (A o A-i) 1 > i'^Da (A o A)"^ DaI. 

The result in (|9|l now follows straightforwardly from the fact 
l'^(Ao A-"^) 1 = n, [19] (see also [18, Fact 7.6.10], [17, 
Lemma 5.4.2(a)]). Note that A is symmetric and thus A-^ — A 
and A"^ = A~^ ■ 
Remark 7: Note that the proof of Corollary |6] is based on 
the result of Proposition |5] in ([8]). An alternative proof could 
follow similarly from a different inequality by Styan in [20] 

RoR-i+I> 2(RoR)"\ 

where R is constrained to be a correlation matrix R o I = I. 

Proposition 8: Consider now the positive semidefinite ma- 
trix AGS". Then, 



Ao A> 



dAdl 



Proof: For the case where A e S" ^ is positive definite, 
from (|9]l in Corollary |6] and Lemma |4] it follows that 



Ao A 



dA 

n 



> 0. 



Applying again Lemma |4] we get 
Ao A> 



dAdi 



(10) 



Now, assume that A G S" is positive semidefinite. We thus 
define e > and consider the positive definite matrix A + el. 
From dTOl i. we know that 



.id^ 



A+el 



(A + el) o (A + el) > - 

Taking the limit as e tends to 0, from continuity, the validity 
of ( [Tol l for positive semidefinite matrices follows. ■ 

Finally, to end this section about mathematical prelimina- 
ries, we give a very brief overview on some basic definitions 
related to minimum mean-square error (MMSE) estimation. 
These definitions are useful in our further derivations due 
to the relation between the entropy and the MMSE unveiled 
in [21]0I Next, we give a lemma concerning the positive 
semidefiniteness of a certain class of matrices closely related 
with MMSE estimation. 

Consider the setting described in the introduction, Y = 
A^^'^X + W. For a given realization of the observations 
vector Y = y, the MMSE estimator, X{y), is given by the 
conditional mean 

X{y) = £{X\Y = v}. 

We now define the conditional MMSE matrix, $x(y), as 
the mean-square error matrix conditioned on the fact that the 
received vector is equal to 1^ = y. Formally 

*x(y) = E {(X - X{y)){X - X{y)f\Y = y] 

= E{XX^\Y = y} (11) 

~E{X\Y = y}E{X^\Y ^y} . 

From this definition, it is clear that ^x{y) is a positive semi- 
definite matrix. 

Now, the MMSE matrix Ex can be calculated by averaging 
^x{y) in ( fTTT i with respect to the distribution of vector Y as 

Ex = E{*x(l")}. (12) 

See below the last lemma in this section. 

Lemma 9: For a given random vector X E R", it follows 
that E{XX^} > E{X}E{X^}. 
Proof: Simply note that 

E{XX^}-E{X}E{X^} 

= E{(X-E{X})(X-E{X})^}>0, 

where last inequality follows from the fact that the expectation 
operator preserves positive semidefiniteness. ■ 

^Strictly speaking the relation found in [21] concerns the quantities of 
mutual information and MMSE, but it is still useful for our problem because 
the entropy h{Y) and the mutual information /(-X"; Y) have the same 
dependence on A up to a constant additive term. 



III. Main result of the paper 

Once all the mathematical preliminaries have been pre- 
sented, in this section we give the main result of the paper, 
namely, the concavity of the entropy power function N{Y) 
in (|6]l, with respect to the diagonal elements of A. Prior to 
proving this result, we present a weaker result concerning 
the concavity of the entropy function h{Y), which is key in 
proving the concavity of the entropy power 

A. Warm up: An entropy inequality 

Theorem 1: Assume Y = A^^^X + W, where X is arbi- 
trarily distributed and W follows a zero-mean white Gaussian 
distribution. Then the entropy h{Y) is a concave function of 
the diagonal elements of A, i.e., 

Vlh{Y) < 0. 

Furthermore, the entries of the Hessian matrix of the entropy 
function h{Y) with respect to A are given by 

^-^E[{E{X,X,\Y}-E{X,\Y}E{X,\Y}f}, (13) 

which can be written more compactly as 
1 



nHY) = --E{^x{Y) o ^j,{Y)} . 



(14) 



Proof: For the computations leading to ( fT3] ) and (fT4l ) 
see Appendix |T] Once the expression in ( fT4b is obtained, 
concavity (or negative semidefiniteness of the Hessian matrix) 
follows straightforwardly taking into account that the matrix 
€>x (y) defined in ( fTTT i is positive semidefinite Vy, Lemma |3] 
and from the fact that the expectation operator preserves the 
semidefiniteness. ■ 

B. Multivariate extension of Costa 's EPI 

1/2 

Theorem 2: Assume Y = A ' X + W, where X is 
arbitrarily distributed and W follows a zero-mean white 
Gaussian distribution. Then the entropy power N{Y) is a 
concave function of the diagonal elements of A, i.e., 

VlNiY) < 0. 

Moreover, the Hessian matrix of the entropy power function 
N{Y) with respect to A is given by 



VlN{Y) 



N{Y) ( d^^dl^ 



-E{*x(>^)o*x(F)} , (15) 



where we recall that cIex is a column vector with the diagonal 
entries of the matrix Ex defined in ( fT2b . 

Proof: First, let us prove ( fTSl l. From the definition of the 
entropy power in ([TJ and applying the chain rule we obtain 



VlNiY) = 



2N{Y) (2Vxh{Y)Vlh{Y) 



Vlh{Y) . 



Now, replacing Wxh{Y) by its expression from [21, Eq. (61)] 

and incorporating the expression for W^h{Y) calculated in 
(fT4l l. the result in ([15]) follows. 

Now that a explicit expression for the Hessian matrix has 
been obtained, we wish to prove that it is negative semidefinite. 
Note from ( fTsT i that, except for a positive factor, the Hessian 
matrix V^A^(l^) is the sum of a rank one positive semidefinite 
matrix and the Hessian matrix of the entropy, which is 
negative semidefinite according to Theorem [T] Consequently, 
the definiteness of W\N{Y) is unknown a priori, and some 
further developments are needed to determine it, which is what 
we do next. 

Consider a family of positive semidefinite matrices A G S" , 
characterized by a certain vector parameter v, A — A(v). 
Applying Proposition[8]to each matrix in this family, we obtain 

dA(v)d^(v) 



A(v) o A(v) > 
Since (fTSI l is true for all possible values of v, we have 



E{A(y)o A(y)} > 



;|dA(v)dX(v)} 



(16) 



(17) 



where now the parameter v has been considered to be a 
random variable, V. Note that the distribution of V is arbitrary 
and does not affect the validity of ( [TtI i. From Lemma |9] we 
know that 



■ {dA(v)dI(v) } > E {dA(v) } E {dX(v) } . 

{dA(v)}E{dX(v)} 



from which it follows that 



E{A(y)o A(y)} > 

Since the operators cIa and expectation commute we finally 
obtain 



E{A(y)o A(y)} > 



dE{A(V)}dE{A(V)} 



Identifying A{V) with the random covariance error matrix 
$x (Y) and using (fT2l i the result in the theorem follows as 



dEx dL 



and N{Y) > 0. ■ 
IV. Conclusion 

1/2 

In this paper we have proved that, for Y = A ' X + W 
the functions N{Y) and h{Y) are concave with respect 
to the diagonal entries of A and have also given explicit 
expressions for the elements of the Hessian matrices W\N{Y) 
and V|/i(F). 

Besides its theoretical interest and inherent beauty, the 
importance of the results presented in this work lie mainly 
in their potential applications, such as, the calculation of the 
optimal power allocation to maximize the mutual information 
for a given non-Gaussian constellation as described in [14]. 



Appendix I 
Calculation of \/\h{Y) 

In this section we are interested in the calculation of 
the elements of the Hessian matrix [V^/i(l^)] ., which are 
defined by 



dXjdXi 



First of all, using the properties of differential entropy we 
write 

h{A^^^X + W)= h{X + A-^^^W) + i log |A|, 
and recalling that we work with natural logarithms we have 

d'^h{A^^^X + W) d'^h{X + A-^^^W) S, 



dXidXj 



dXjdX^ 



2xr 



(18) 



We are now interested in expanding the first term in the right 
hand side of last equation, so we define the diagonal matrix 
r = A"^ and the random vector Z = A~^^^Y. Thus [r]i, = 
7i = 1/Ai and Z = X + A'^/'^W = X + T^^'^ 
Applying the chain rule we obtain 



W. 



d^h{X + A-'^^W) _ 1 d^h{X + T'''W 



dXidXj 



2Sij dh{X 



r=A- 



The expressions for the two terms 



r=A- 



(19) 



and 



dh{X 



-,1/2 



W) 



are given in Appendix [III where we also sketch how they can 
be computed, for further details see [14]. Using these results, 
the right hand side of the expression in (19[ can be rewritten 



1 / 1 J (E{A,A,|Z}-E{A,|Z}E{A,|Z})- 



27? 
25i, 



J E{X?|Z}-(E(X.|Z})M _ 



+ 



Af \2^1 
Simplifying terms we obtain 



7,-E{(A,-E{A,|Z})2}) 



r=A" 



r=A- 



1 E { ( E { A, A, I Z} - E { A, I Z} E { A, I } 



2A? 



A, 



■E{E{Af|Z}-(E{A,|Z}f} 

+ ^-^E{(A,-E{A,|Z})2}. (20) 
Finally, noting that 

E{(A, - E{A,|Z})2} = E{E{A,2|Z} - (E{A,|Z})2} 
E{/(A)|Z} = E{/(A)|Ai/2z} = E{/(A)|F}, 



and plugging ( |20] i in ( fTSl ) we obtain the desired result in ( fT3] i: 
dXidXj 

= -\£[{E{Xa,\Y} -£{X,\Y}£{X,\Y}f] . 

By simple inspection of the entries of the Hessian matrix 
above, the result in (fT4b can be found. 

Appendix II 
Gradient and Hessian of h{Z = X + r^^^W) 

The elements of the gradient of h{Z = X + T^/'^W) with 
respect to the diagonal elements of T can be found thanks to 
the complex multivariate de Bruijn's identity found in [22, Th. 
4] adapted to the real case 



dhjX^T^'^W) _ 1 \ ( d\ogPz{Z) 



(21) 



The elements of the Hessian matrix can be found quite 
directly from the expressions found in [7, Eq. (50)] and in 
Villani's Lemma in [9] for the single dimensional second 
derivative A^h{X + ^W)/ dt^ (see [14] for further details 
on the specific generalization to the multidimensional case): 

a',,(x.rv^»^) ^ 1 f/g^jog^yi ,22, 

To further elaborate the expressions in ( |2TI ) and ( l22l i we 
see that we need to compute the gradient and Hessian of 
the function logPz(z)- The expression for the gradient has 
already been given in [21, Eq. (56)], [22, Eq. (105)] 



d\ogPz{z) E{X,\Z = z} 



dzi 



(23) 



The expression for the Hessian of logPz(z) requires 
slightly more elaboration and here we only give a sketch, more 
details can be found in [14]. 

Differentiating ( l23T l with respect to Zj we obtain 



{E{Xaj\Z^z} 



-E{X,\Z = z}E{X,\Z = z})-^, (24) 



where we have used that [14] 



dzj 



7j 



-E{X.,\Z^z}E{X,\Z = z})^ 



Plugging (l23T l into (ISTT i and operating according to the 
derivation in [22, Eq. (106)] we obtain 



dh{X + T^''^W) _ 1 



(7.-E{(X,-E{A,|Z})2}) 



Similarly, plugging ( l24b into ( |22] ) we obtain the desired 
expression for the Hessian as 



7i , 



which can be expanded as 



i f {E{X,X,\Z} -E{X,\Z}E{X,\Z}f \ 
"2 1 7h| 1 

„ h , ^\ E{Xf\Z]-{E{X,\Z}f \ ^ 

H \ 7? j 
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